Joint word2vec Networks for Bilingual Semantic Representations

نویسندگان

  • Lior Wolf
  • Yair Hanani
  • Kfir Bar
  • Nachum Dershowitz
چکیده

We extend the word2vec framework to capture meaning across languages. The input consists of a source text and a word-aligned parallel text in a second language. The joint word2vec tool then represents words in both languages within a common “semantic” vector space. The result can be used to enrich lexicons of under-resourced languages, to identify ambiguities, and to perform clustering and classification. Experiments were conducted on a parallel English-Arabic corpus, as well as on English and Hebrew Biblical texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-resource bilingual lexicon extraction using graph based word embeddings

In this work we focus on the task of automatically extracting bilingual lexicon for the language pair Spanish-Nahuatl. This is a low-resource setting where only a small amount of parallel corpus is available. Most of the downstream methods do not work well under low-resources conditions. This is specially true for the approaches that use vectorial representations like Word2Vec. Our proposal is ...

متن کامل

MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]’s word2vec features, Le and Mikolov [2014]’s paragraph vector (batch and online) and Luong et al. [2015]’s model for bilingual distributed representations. MultiVec also includes different distance measu...

متن کامل

Learning Bilingual Distributed Phrase Representations for Statistical Machine Translation

Following the idea of using distributed semantic representations to facilitate the computation of semantic similarity between translation equivalents, we propose a novel framework to learn bilingual distributed phrase representations for machine translation. We first induce vector representations for words in the source and target language respectively, in their own semantic space. These word v...

متن کامل

Joint and Coupled Bilingual Topic Model Based Sentence Representations for Language Model Adaptation

This paper is concerned with data selection for adapting language model (LM) in statistical machine translation (SMT), and aims to find the LM training sentences that are topic similar to the translation task. Although the traditional approaches have gained significant performance, they ignore the topic information and the distribution information of words when selecting similar training senten...

متن کامل

Acquiring distributed representations for verb-object pairs by using word2vec

We propose three methods for obtaining distributed representations for verb-object pairs in predicated argument structures by using word2vec. Word2vec is a method for acquiring distributed representations for a word by retrieving a weight matrix in neural networks. First, we analyze a large amount of text with an HPSG parser; then, we obtain distributed representations for the verb-object pairs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Comput. Linguistics Appl.

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2014